Search CORE

62 research outputs found

DPI over commodity hardware: implementation of a scalable framework using FastFlow

Author: DE SENSI DANIELE
Publication venue: 'Pisa University Press'
Publication date: 20/02/2013
Field of study

In the last years we assisted to a large increase of the number of applications running on top of IP networks. Consequently the need to implement very efficient monitoring solutions that can manage these high data rates and that can classify the type of traffic which is traveling over the network has increased. For example, as far as network security is concerned, in the recent years we have seen a shift from so-called "network-level" attacks, which target the network they are transported on (e.g. Denial of Service), to content-based threats which exploit applications vulnerabilities and require sophisticated levels of intelligence to be detected. For some of these threats, it is no more sufficient to have only a software solution on the client side but we also need to run some controls on the network itself. To manage these kinds of scenarios, payload inspection is often required in order to correctly identify the application protocol and to process the data carried over it. This is the reason why, in recent years, Deep Packet Inspection (DPI) technology has emerged. This kind of processing is in many cases implemented, at least in part, through dedicated hardware. However, full software solutions may often be more appealing because they are typically more economical and have, in general, the capability to react faster to protocols evolution and changes. Moreover, software solutions which run over general purpose hardware do not exploit the underlying multiprocessor architecture, providing only the capability to process the incoming packets sequentially. Furthermore, many DPI research works that can be found in literature and which exploits multicore architectures are often characterized by a poor scalability, due to the overhead required for synchronization and to load unbalance among the used cores. In this thesis, we will describe the design and implementation of a DPI framework capable of managing current networks rates using commodity multicore hardware. Our framework provides the possibility to identify the protocol, to specify the kind of data to extract when it has been identified and how these data has to be processed. Differently from existing works, the developed framework has been designed according to the structured parallel programming theory, allowing thus to completely hide to the user the complexity of the management of the problems related to an efficient exploitation of the underlying architecture. These concepts have then been applied using FastFlow, a library for structured parallel programming targeting both shared memory and distributed memory architectures

Electronic Thesis and Dissertation Archive - Università di Pisa

A power-aware, self-adaptive macro data flow framework

Author: Danelutto Marco
De Sensi Daniele
Torquati Massimo
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2017
Field of study

The dataflow programming model has been extensively used as an effective solution to implement efficient parallel programming frameworks. However, the amount of resources allocated to the runtime support is usually fixed once by the programmer or the runtime, and kept static during the entire execution. While there are cases where such a static choice may be appropriate, other scenarios may require to dynamically change the parallelism degree during the application execution. In this paper we propose an algorithm for multicore shared memory platforms, that dynamically selects the optimal number of cores to be used as well as their clock frequency according to either the workload pressure or to explicit user requirements. We implement the algorithm for both structured and unstructured parallel applications and we validate our proposal over three real applications, showing that it is able to save a significant amount of power, while not impairing the performance and not requiring additional effort from the application programmer

Archivio della Ricerca - Università di Pisa

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Mammut: High-level management of system knobs and sensors

Author: Danelutto Marco
DE SENSI Daniele
Torquati Massimo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Managing low-level architectural features for controlling performance and power consumption is a growing demand in the parallel computing community. Such features include, but are not limited to: energy profiling, platform topology analysis, CPU cores disabling and frequency scaling. However, these low-level mechanisms are usually managed by specific tools, without any interaction between each other, thus hampering their usability. More important, most existing tools can only be used through a command line interface and they do not provide any API. Moreover, in most cases, they only allow monitoring and managing the same machine on which the tools are used. MAMMUT provides and integrates architectural management utilities through a high-level and easy-to-use object-oriented interface. By using MAMMUT, is possible to link together different collected information and to exploit them on both local and remote systems, to build architecture-aware applications

Directory of Open Access Journals

Archivio della Ricerca - Università di Pisa

A Reconfiguration Algorithm for Power-Aware Parallel Applications

Author: DANELUTTO MARCO
DE SENSI DANIELE
TORQUATI MASSIMO
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

In current computing systems, many applications require guarantees on their maximum power consumption to not exceed the available power budget. On the other hand, for some applications, it could be possible to decrease their performance, yet maintain an acceptable level, in order to reduce their power consumption. To provide such guarantees, a possible solution consists in changing the number of cores assigned to the application, their clock frequency, and the placement of application threads over the cores. However, power consumption and performance have different trends depending on the application considered and on its input. Finding a configuration of resources satisfying user requirements is, in the general case, a challenging task. In this article, we propose Nornir, an algorithm to automatically derive, without relying on historical data about previous executions, performance and power consumption models of an application in different configurations. By using these models, we are able to select a close-to-optimal configuration for the given user requirement, either performance or power consumption. The configuration of the application will be changed on-the-fly throughout the execution to adapt to workload fluctuations, external interferences, and/or application’s phase changes. We validate the algorithm by simulating it over the applications of the Parsec benchmark suit. Then, we implement our algorithm and we analyse its accuracy and overhead over some of these applications on a real execution environment. Eventually, we compare the quality of our proposal with that of the optimal algorithm and of some state-of-the-art solutions

Crossref

Archivio della Ricerca - Università di Pisa

Towards Power-Aware Data Pipelining on Multicores

Author: Aldinucci Marco
Daniele De Sensi
Gabriele Mencagli
Marco Danelutto
Massimo Torquati
Publication venue: 'Universidad de Valladolid'
Publication date: 01/01/2017
Field of study

Institutional Research Information System University of Turin

Power-Aware Pipelining with Automatic Concurrency Control

Author: Aldinucci Marco
Danelutto Marco
Daniele De Sensi
Gabriele Mencagli
Massimo Torquati
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

Institutional Research Information System University of Turin

Flare: Flexible In-Network Allreduce

Author: Ashkboos Saleh
De Sensi Daniele
Di Girolamo Salvatore
Hoefler Torsten
Li Shigang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/06/2021
Field of study

The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this operation can be accelerated by offloading it to network switches, that aggregate the data received from the hosts, and send them back the aggregated result. However, existing solutions provide limited customization opportunities and might provide suboptimal performance when dealing with custom operators and data types, with sparse data, or when reproducibility of the aggregation is a concern. To deal with these problems, in this work we design a flexible programmable switch by using as a building block PsPIN, a RISC-V architecture implementing the sPIN programming model. We then design, model, and analyze different algorithms for executing the aggregation on this architecture, showing performance improvements compared to state-of-the-art approaches

arXiv.org e-Print Archive

Analysing Multiple QoS Attributes in Parallel Design Patterns-Based Applications

Author: Brogi Antonio
Danelutto Marco
de Sensi Daniele
Ibrahim Ahmad
Soldani Jacopo
Torquati Massimo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Parallel design patterns can be fruitfully combined to develop parallel software applications. Different combinations of patterns can feature different QoS while being functionally equivalent. To support application developers in selecting the best combinations of patterns to develop their applications, we hereby propose a probabilistic approach that permits analysing, at design time, multiple QoS attributes of parallel design patterns-based application. We also present a proof-of-concept implementation of our approach, together with some experimental results

Archivio della Ricerca - Università di Pisa

Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees

Author: De Sensi Daniele
Di Girolamo Salvatore
Hoefler Torsten
Molero Edgar Costa
Vanbever Laurent
Publication venue
Publication date: 28/09/2023
Field of study

The allreduce operation is an essential building block for many distributed applications, ranging from the training of deep learning models to scientific computing. In an allreduce operation, data from multiple hosts is aggregated together and then broadcasted to each host participating in the operation. Allreduce performance can be improved by a factor of two by aggregating the data directly in the network. Switches aggregate data coming from multiple ports before forwarding the partially aggregated result to the next hop. In all existing solutions, each switch needs to know the ports from which it will receive the data to aggregate. However, this forces packets to traverse a predefined set of switches, making these solutions prone to congestion. For this reason, we design Canary, the first congestion-aware in-network allreduce algorithm. Canary uses load balancing algorithms to forward packets on the least congested paths. Because switches do not know from which ports they will receive the data to aggregate, they use timeouts to aggregate the data in a best-effort way. We develop a P4 Canary prototype and evaluate it on a Tofino switch. We then validate Canary through simulations on large networks, showing performance improvements up to 40% compared to the state-of-the-art

arXiv.org e-Print Archive